HaoLap: A Hadoop based OLAP system for big data

نویسندگان

  • Jie Song
  • Chaopeng Guo
  • Zhi Wang
  • Yichan Zhang
  • Ge Yu
  • Jean-Marc Pierson
چکیده

In recent years, facing information explosion, industry and academia have adopted distributed file system and MapReduce programming model to address new challenges the big data has brought. Based on these technologies, this paper presents HaoLap (Hadoop based oLap), an OLAP (OnLine Analytical Processing) system for big data. Drawing on the experience of Multidimensional OLAP (MOLAP), HaoLap adopts the specified multidimensional model to map the dimensions and the measures; the dimension coding and traverse algorithm to achieve the roll up operation on dimension hierarchy; the partition and linearization algorithm to store dimensions and measures; the chunk selection algorithm to optimize OLAP performance; and MapReduce to execute OLAP. The paper illustrates the key techniques of HaoLap including system architecture, dimension definition, dimension coding and traversing, partition, data storage, OLAP and data loading algorithm. We evaluated HaoLap on a real application and compared it with Hive, HadoopDB, HBaseLattice, and Olap4Cloud. The experiment results show that HaoLap boost the efficiency of data loading, and has a great advantage in the OLAP performance of the data set size and query complexity, and meanwhile HaoLap also completely support dimension operations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

A Data Pre-partitioning and Distribution Optimization Approach for Distributed Data Warehouses

The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework an open source project based on the MapReduce paradigm is a popular choice for distributed data warehouses and big data analytics. In this paper, we propose an original approach for partitioning and collocating data in distributed file systems, especially Hadoop-based systems, and this, ...

متن کامل

Sweet KIWI: Statistics-Driven OLAP Acceleration using Query Column Sets

KIWI is a SQL-on-Hadoop system enabling batch and interactive analytics for big data. In database systems, materialized views, stored pre-computed results for queries, are one of the most commonly used techniques to improve the query processing speed. However, the key challenge in using materialized views is maintaining their freshness as base data changes. This paper introduces a new approach ...

متن کامل

SAP HANA - From Relational OLAP Database to Big Data Infrastructure

SAP HANA started as one of the best-performing database engines for OLAP workloads strictly pursuing a main-memory centric architecture and exploiting hardware developments like large number of cores and main memories in the TByte range. Within this paper, we outline the steps from a traditional relational database engine to a Big Data infrastructure comprising different methods to handle data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Systems and Software

دوره 102  شماره 

صفحات  -

تاریخ انتشار 2015